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Abstract 

Background: Protein interactions are essential for coordinating cellular functions. Proteomic studies have already 
elucidated a huge amount of protein-protein interactions that require detailed functional analysis. Understanding 
the structural basis of each individual interaction through their structural determination is necessary, yet an 
unfeasible task. Therefore, computational tools able to predict protein binding regions and recognition modes are 
required to rationalize putative molecular functions for proteins. With this aim, we previously created SCOWLP, a 
structural classification of protein binding regions at protein family level, based on the information obtained from 
high-resolution 3D protein-protein and protein-peptide complexes. 

Description: We present here a new version of SCOWLP that has been enhanced by the inclusion of protein- 
nucleic acid and protein-saccharide interactions. SCOWLP takes interfacial solvent into account for a detailed 
characterization of protein interactions. In addition, the binding regions obtained per protein family have been 
enriched by the inclusion of predicted binding regions, which have been inferred from structurally related proteins 
across all existing folds. These inferences might become very useful to suggest novel recognition regions and 
compare structurally similar interfaces from different families. 

Conclusions: The updated SCOWLP has new functionalities that allow both, detection and comparison of protein 
regions recognizing different types of ligands, which include other proteins, peptides, nucleic acids and 
saccharides, within a solvated environment. Currently, SCOWLP allows the analysis of predicted protein binding 
regions based on structure-based inferences across fold space. These predictions may have a unique potential in 
assisting protein docking, in providing insights into protein interaction networks, and in guiding rational 
engineering of protein ligands. The newly designed SCOWLP web application has an improved user-friendly 
interface that facilitates its usage, and is available at http://www.scowlp.org. 



Background 

Proteins are ubiquitous and interact with other mole- 
cules to perform their function, being conditioned to 
timing and location [1]. High- throughput technologies 
for the identification of protein interactions are generat- 
ing a plethora of new data that should be independently 
studied to decipher the specific molecular role of the 
proteins and their cellular functions [2]. Structural 
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determination methods at atomic resolution are indis- 
pensable for the functional characterization of protein 
interactions. However, techniques for isolating protein 
complexes and their structural determination are still 
encountering many challenges, and hence, experimental 
structural studies are not always possible. Alternatively, 
the rapid accumulation of protein complex structures in 
the PDB repository [3,4] provides an unprecedented 
opportunity for comparative analysis of protein interac- 
tions that can be used to predict binding regions and 
modes [5], to model protein complexes [6], and to 
improve our understanding of the principles governing 
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protein recognition [7]. To facilitate these comparative 
studies, it is necessary to generate tools that can allow 
us to analyze the available experimental structures of 
protein complexes [8]. In fact, several databases have 
been developed to structurally identify and classify pro- 
tein-protein and protein-peptide interactions at family 
level, such as 3DID [9], SCOPPI [10] and SCOWLP 
[11]. Their classifications are based on collecting all 
interacting information per protein family, then calculat- 
ing the binding residues similarity, and finally clustering 
the different binding regions and their partners. Another 
database, IBIS [12], contains binding regions and inter- 
acting partners inferred from the inspection of com- 
plexes formed by close homologous proteins instead of 
using structural classification schemes. Unlike the 
others, SCOWLP has been developed towards an atomic 
inspection of the interactions by applying physicochem- 
ical principles and by considering water molecules in 
protein interfaces, since solvent has been shown to be 
abundant and important in the mediation of protein 
interactions [13,14], and to improve protein contact pre- 
dictions [15] and docking [16]. 

SCOWLP is a database and a web application contain- 
ing a structural classification of protein binding regions 
at SCOP family level, including protein-protein and pep- 
tide interactions [11]. In the new updated version we 
present here, we additionally include two biologically 
relevant protein ligands that are quite abundant in the 
PDB: saccharides (SAC) and nucleic acids (NA) [17-20]. 
We also considered solvent in the definition of protein 
interactions, since it has been shown to be critical med- 
iating both, protein-NA [21] and protein-SAC [22] 
interactions, highlighting the importance of the new 
SCOWLP to perform in-detail inspection of these kind 
of interactions. Another novelty in SCOWLP is the 
inclusion of predicted binding regions for each protein 
family. These predictions are inferred from significantly 
conserved binding regions belonging to structurally 
similar protein families independently of their fold 
[5,23]. It has been observed that proteins with different 
folds and functions can recognize molecules through 
binding regions containing similar local structural fea- 
tures or interacting motifs [24-26]. Therefore, the pre- 
dicted binding inferences might become very useful to 
suggest alternative recognition regions for a protein 
family and to compare structurally similar binding 
regions from different families. 

In summary, the updated SCOWLP classification with 
its newly designed web application represent a unique 
framework for the identification and comparative analy- 
sis of protein binding regions at atomic level. In the fol- 
lowing sections, we explain the methodology used to 
build the database, and describe the architecture and 
possible usages of the web application. 



Construction and Content 

The new version of SCOWLP contains protein interac- 
tions with different ligand types, including proteins, pep- 
tides, nucleic acids (NA) and saccharides (SAC), taking 
into account interfacial solvent mediating protein inter- 
actions. Interacting residues and molecules are described 
at physicochemical level according to atom type and dis- 
tance criteria. The following types of interactions are 
considered: hydrogen bonds, with distance donor/accep- 
tor atom < 3.6 A; salt bridges, with charged atom dis- 
tance < 4 A; Van der Waals, with atoms at distance < 
4.5 A. Water-mediated residue interactions through a 
water molecule are also considered in the interface defi- 
nition. The specific definition of the ligand types, and 
the protein interfaces is as follows: 

♦ Protein-protein interactions: The 4,194 protein 
families from SCOP VI. 75 [27] are used to define 
protein domain boundaries within PDB files. 

♦ Protein-peptide interactions: All PDB chains that 
are labeled "ATOM", not defined in SCOP, and 
shorter than 90 residues are considered peptides [28]. 

♦ Protein-nucleic acids interactions: PDB residues 
labeled as standard nucleotides are selected. We dif- 
ferentiate RNA form DNA by the presence of the 
02' group in the ribose ring. Nucleic acid chains are 
merged in a single unit (double strand) if there is at 
least one inter-base atomic interaction among 
chains. 

♦ Protein-saccharide interactions: The SAC mole- 
cules are extracted from PDB files labeled with the 
terms "saccharide", "carbohydrate" and/or "sugar", 
and containing HETATM atoms. We obtained 307 
unique molecules (three-letter code) that include 
neither standard or modified nucleotides, nor SAC 
modifications bigger than the SAC moieties. The oli- 
gosaccharide units can be represented in the PDB 
either within a common HETATM type or as a col- 
lection of them. In the later case, SAC units are 
identified and merged together in a single oligosac- 
charide molecule using the PDB connectivity. SAC 
connectivity to protein domains is also checked to 
differentiate covalent (intra) and non-covalent (inter) 
protein interactions. 

SCOWLP currently contains 97,252 protein-protein, 
3,563 protein-peptide, 2,568 protein-NA (1,660 DNA, 
908 RNA) and 10,590 protein-SAC complexes. Crystal 
packing contacts are filtered out using a support vector 
machine-based program, NOXclass [29] (cutoff 70%), 
which takes into account the distinctive properties of 
these protein contacts [30]. 

The classification of family binding regions has been 
performed by clustering interacting domains based on 
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binding region similarities. As described previously [11], 
this value has been obtained based on the interacting 
residues overlap once they are mapped onto the struc- 
ture-based sequence alignments of the family members. 
Likewise, the inferred binding regions are obtained 
among members of different families aligned using non- 
sequential structural alignments as previously described 
[5]. SCOWLP contains a total of 7,121 protein binding 
regions identified at zero similarity cutoff; from which 
2,315 have more than one interface. In addition, it con- 
tains 8,985 predicted binding regions, 786 of them in 
protein families that so far lack binding information in 
the PDB. 

Utility & Discussion 

Web architecture 

SCOWLP web application follows the SCOP hierarchical 
levels to classify protein structures: RT-root, CF - class 
family, SF - super family, where families are finally 
listed. In addition, it extends the SCOP classification 
with three protein interaction levels: FA - family, BR - 
binding region and IF - interface. FA level contains a 
list of binding regions, defined as distinctive surface 
regions of a protein family used to recognize other 
molecules. BR level contains a list of interfaces distin- 
guishing the different partners or ligands that a specific 
region can recognize. IF level contains a list of domains 
interacting with the same ligand, and that are linked to 
their original PDB file (e.g. 2oei:AB). Each binding 
region and interface is represented by identifiers 
(BR_24483 or IF_24486), since their association to an 
automatic description is not possible. 

Query 

SCOWLP web application facilitates the hierarchical 
navigation through the different levels. It also contains a 
keyword search box for SCOP descriptions, PDB Ids 
and similar SCOP domain sequences identified using 
the BLAST algorithm. Some specific examples of the 
query capabilities are shown in the SCOWLP main page. 

Search options 

The interacting levels of SCOWLP (FA, BR, IF) contain 
the Search options located at the top of the web applica- 
tion (Figure 1). This feature reduces the query and navi- 
gation to a specific list of interacting domains based on: 
i) Ligand type: proteins, peptides, DNA, RNA and sac- 
charides; ii) Complex type: same (homo) or different 
(hetero) domain families; iii) Interaction type: same 
(intra) or to different (inter) domain chains; iv) Cluster- 
ing cutoff: four cutoff values that define the final bind- 
ing region clusters per family, as described in ref. [5] 
(default cutoff is zero). Note that the possibility of filter- 
ing out homo-dimers and intra-domain interactions may 



become very useful since their number is rather high 
and irrelevant for many analyses. 

Data analysis 

The information at each interacting level is displayed in 
the web page in consecutive steps. Each level shares a 
common web page composed by three interconnected 
frames to facilitate the analysis of the information (Fig- 
ure 1): 

♦ Alignment frame: The structure-based sequence 
alignments of the corresponding domains are shown 
in each interacting level. In addition, the FA level 
also includes predicted binding regions, whereas BR 
level includes predicted interfaces, information 
inferred from other structurally-related protein 
families. The interacting residues are highlighted for 
better analysis of binding patterns. At IF level, these 
residues can be colored by their physicochemical 
properties (hydrophobic or hydrophilic), and by the 
water participation in the interactions (dry, wet or 
dual). The patterns and the physicochemical proper- 
ties facilitate the distinction between conserved and 
variable interactions. The structure of each member 
can be visualized in the 3D Viewer frame upon click 
selection of the Jmol icon. 

♦ 3D Viewer frame: The Jmol plug-in [31] is avail- 
able for 3D visualization of the members shown in 
the Alignment frame as follows: The FA level dis- 
plays a surface representation of the binding regions 
onto a representative structure for general visualiza- 
tion of the spatial locations used for recognition. 
The BR level allows the 3D visualization of the dif- 
ferent interfaces containing different ligands and/or 
binding modes. The IF level, allows the user to 
visualize atomic details of all domains interacting 
with a common interface and to label them accord- 
ing to the physicochemical and solvent criteria 
selected in the alignment frame. Subtle structural 
differences within domains interacting with the same 
interface can be detected and analyzed. 

♦ Control frame: This frame contains Jmol-interac- 
tive commands and includes links to the PDB and 
FA levels. In addition, the residue-residue interaction 
list is displayed with their physicochemical and 
water-mediation properties for each interacting 
domains. 

The Frame Interconnectivity feature implemented in 
the IF level of the new SCOWLP allows the possibility 
to automatically highlight (i.e. centered zoom and color) 
a specific interacting residue in the 3D structure of the 
viewer upon clicking either the Alignment or the Con- 
trol frame (Figure 1). 
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Figure 1 SC0WLP architecture; Web application snapshot at the Interface level for PHD domain interacting with a peptide ligand. The search 
options for filtering the information, and the hierarchy level to facilitate the navigation are shown. In addition, the three interconnected frames 
for data analysis are also highlighted: Alignment, 3D Viewer and Control frames. A residue selected either in alignment or control frames (black 
boxes) is automatically highlighted in the 3D Viewer (circle). 



Applications 

The main page contains examples of the SCOWLP main 
functionalities: i) exploration of the different surface 
regions that a protein family uses to recognize other 
molecules, ii) identification of the different ligands that 
a given region can recognize, and iii) comparative analy- 
sis of the interacting properties of a group of domains 
in complex with the same ligand. These analyses include 
the conservation and variability of not only interfacial 
residues but of their interactions, taking into considera- 
tion water-mediated interactions. 

It is important to highlight that one of the main 
potentials of the new SCOWLP relies on the rapid iden- 
tification of protein families able to recognize one or 
several ligand types through the same region. For 
instance, the selection of a specific combination of 
ligands in the search options ("protein and DNA") will 
show only those SCOP families that can recognize 



proteins and DNA thorough the same surface region. 
An example is the "heat-shock transcription factor" 
(search by keyword), that has a binding region at FA 
level (BR_1892) recognizing these two ligands. By click- 
ing at the BR Id, two different interfaces are shown for 
this family binding region, one recognizing a protein 
and the other DNA, and therefore, responsible of differ- 
ent functions (Figure 2a). After clicking at the IF Id, the 
web application also allows the analysis of the interact- 
ing features governing the two interfaces. 

Another key feature of the new SCOWLP is the possi- 
bility to obtain alternative binding regions and inter- 
faces, as schematically shown in Figure 2b. These 
predicted binding regions are inferred from other struc- 
turally similar protein families. For instance, if we select 
in the Search options for "Binding region type: only pre- 
dicted" we will filter those SCOP families that do not 
have any binding information available in the PDB yet 
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Figure 2 SCOWLP applications: a) The same binding region of the protein family "heat-shock transcription factor" (blue ribbon) presents 
interfaces for DNA (left, pdb:3hts) and protein (right, pdb:1fbq); b) Schematic representation of the method used to infer binding regions and 
interfaces. Two proteins from different SCOP families (violet and red) present structural resemblances. Since violet protein interacts with its green 
ligand through a region that is conserved in the violet-red structural alignment, violet's binding region is inferred to the red protein, suggesting 
that it could also recognize other proteins through this inferred region in a similar binding mode to violet-green interface. 



(i.e. no structures in complex with other molecules), and 
that present predicted binding regions (786 families). 
Navigating through the SCOP hierarchy, the predicted 
binding regions for any of these families can be 
explored. An example is the "DEATH effector domain, 
DED" (search by keyword), where three different bind- 
ing regions have been inferred. Detailed analysis of the 
binding regions (3D Viewer) and the structurally similar 
proteins and their ligands (Control frame) might be use- 
ful to explore putative binding regions and ligands for 
DED domain. 

The number of protein-protein interactions obtained 
from large scale technologies is increasing [32], though 
protein-protein interaction networks contain a consider- 
able amount of noise due to intrinsic errors [2]. Struc- 
tural information has already been implemented into 
these networks in order to distinguish direct vs. indirect 
interactions between proteins, and competing vs. com- 
plementary interactions, whether two proteins interact 
to a third one through the same or a different binding 
region [8,33]. The structural classification and the pre- 
dicted protein binding regions contained in SCOWLP 
might contribute towards a more accurate construction 
of protein-protein interaction networks. 

In summary, the examples explained above point out 
the unique potential of SCOWLP for identification, ana- 
lysis and prediction of protein interactions. Our ultimate 
goal is to facilitate the analysis of protein interactions 
that may contribute to a better understanding of the 
rules governing protein recognition and molecular 
function. 

Conclusions 

Here we present an updated and enhanced version of 
the SCOWLP database and its user-friendly web appli- 
cation. The new SCOWLP comprises its previous 



structural classification of all protein binding regions of 
the PDB at protein family level, including protein-pep- 
tide and water-mediated interactions, which has been 
enhanced by the inclusion of protein-nucleic acid and 
protein-saccharide interactions. In addition, the original 
functionality of SCOWLP towards the prediction of pro- 
tein binding regions has been augmented by the inclu- 
sion of binding regions inferred from structurally similar 
proteins across fold space. The new SCOWLP database 
and its newly designed web application, which includes 
new helpful features such as frame interconnectivity, 
represent useful tools for the detailed analysis of the 
protein interactome. They provide the user a valuable 
assistance in suggesting protein recognition regions and 
comparing structurally similar interfaces from different 
protein families, which denotes their unique potential 
for gaining a better understanding of protein interaction 
networks and for guiding protein docking and rational 
ligand design. 

Availability & Requirements 

SCOWLP database and web application are freely avail- 
able at http://www.scowlp.org. MySQL files containing 
the entire database can be downloaded for independent 
studies. SCOWLP classification is updated with each 
new SCOP release. Programming language: Oracle Java 
6.0, Javascript. Requirements-serverside: Oracle Java 6.0 
(or higher), Apache Tomcat 6.0 (or higher). Require- 
ments-clientside: Oracle Java 6.0 (or higher), Mozilla 
Firefox 3.6 (or higher), Google Chrome 13 (or higher). 
Software source code will be provided upon request for 
non commercial usage. 
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